Using Web Page Titles to Rediscover Lost Web Pages

نویسندگان

  • Jeffery L. Shipman
  • Martin Klein
  • Michael L. Nelson
چکیده

Titles are denoted by the TITLE element within a web page. We queried the title against the the Yahoo search engine to determine the page’s status (found, not found). We conducted several tests based on elements of the title. These tests were used to discern whether we could predict a pages status based on the title. Our results increase our ability to determine bad titles but not our ability to determine good titles.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Web Infrastructure for Real Time Recovery of Missing Web Pages

USING THE WEB INFRASTRUCTURE FOR REAL TIME RECOVERY OF MISSING WEB PAGES Martin Klein Old Dominion University, 2011 Director: Dr. Michael L. Nelson Given the dynamic nature of the World Wide Web, missing web pages, or “404 Page not Found” responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or...

متن کامل

Investigating the Change of Web Pages' Titles Over Time

Inaccessible web pages are part of the browsing experience. The content of these pages however is often not completely lost but rather missing. Lexical signatures (LS) generated from the web pages’ textual content have been shown to be suitable as search engine queries when trying to discover a (missing) web page. Since LSs are expensive to generate, we investigate the potential of web pages’ t...

متن کامل

An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages

For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the “aboutness” of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constr...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1002.2439  شماره 

صفحات  -

تاریخ انتشار 2010